backoff 退避算法

学习公司服务发现框架时，发现有用到 exponential jitter backoff 这个算法，这里来学习一下

实际上这种算法的应用场景就是用来取得最优的重试时间，例如某个服务暂时离线，这种时候得不断的重试，那如何设置这个重试的时间间隔呢？

而常见的 ”重修->休息->重试” 的算法叫做 backoff 退避算法。所以什么是 backoff 呢？

从重试场景看，就是用某种算法，找到一个合理的重试时间，而不是所有异常请求都一窝蜂的直接去重试

大部分重试、重发等场景都会利用 backoff 算法来降低冲突和无谓的资源消耗

backoff 的常见算法

backoff 基本都需要设定最大重试次数或者最大间隔时间。因为无限次的重试往往没有意义，过长的间隔时间也不利于响应下游的恢复

backoff 算法有以下几种常见实现方式：

fixed backoff

固定间隔时间的退避算法。每次重试都会间隔固定的 interval 时间

优点：实现非常简单缺点：interval 不太好设定。设置的过小，下游长时间的故障可能造成大量资源浪费；设置的过大，对于偶现的网络抖动不能及时投递数据

random backoff

给定一个重试等待的最大时间 maxInterval，直接随机一个等待时间出来。范围是 [0,maxInterval)，比较暴力

优点：实现非常简单，较好的避免冲突缺点也很明显：可能一次偶然的网络抖动，却等待了相当长一段时间才重试成功

fibonacci backoff

基于 fibonacci 数列的退避算法。能较好的避免冲突，及时响应短暂的下游故障

核心算法：

next := prev + prevPrev
prevPrev = prev
prev = next
return next

输出结果如下：

=== RUN   TestFibonacci_Next
0ms
10ms
10ms
20ms
30ms
50ms
80ms
130ms

优点：能够快速恢复数据投递。适合下游故障能够快速恢复或者故障率很低的情况缺点：对于下游较长时间的故障，比较浪费资源

exponential backoff

指数退避算法。也就是每次重试的间隔时间都是指数增长的。那为什么是指数增长呢？

20220429171005

可以看到，随着间隔时间变长，事件的发生概率急剧下降，呈指数式衰减，所以，指数退避算法随着重试次数的增加，时间间隔变长，发生冲突的概率是非常低的。因此很适合作为 backoff 的一种实现算法

核心代码如下：

// minInterval: 表示初始的时间间隔。例如 10ms
// factor: 表示指数因子。例如 2
// attempts: 表示重试的次数
next := float64(minInterval) * math.Pow(factor, attempts)

输出结果如下：

=== RUN   TestExponential_Next
10ms
20ms
40ms
80ms
160ms
320ms
640ms
1280ms
...

优点：实现比较简单。同时冲突的概率也很低，在重试初期可以在相对比较短的时间内完成。对于服务宕机时间较长的情况，也可以在一个稳定的长时段内重试，不会空耗系统资源缺点：当大部分节点恰好都在同一个时间点发生异常，那由于每次重试的间隔时间都是一致的会导致容易发生冲突

exponential jitter backoff

指数抖动退避算法，就是弥补指数退避算法的缺点。每次计算出下一次重试的间隔时间的时候加上一定的随机抖动时间，使同一时间需要重试的请求错开

核心代码：

// minInterval: 表示初始的时间间隔。例如10ms
// factor: 表示指数因子。例如2
// attempts: 表示重试的次数
// jitterFactor: 表示抖动的因子。例如0.5

next := float64(minInterval) * math.Pow(factor, attempts)
if jitterFactor > 0 {
    j := jitterFactor * next
    min := next - j
    max := next + j
    next = min + rand.Float64()*(max-min+1)
}

输出结果如下：

=== RUN   TestExponential_NextWithJitter
11ms
28ms
46ms
75ms
147ms
379ms
362ms
840ms

代码实现

在 github 上发现一个较好的实现. github.com/jpillora/backoff

// Package backoff provides an exponential-backoff implementation.
package backoff

import (
    "math"
    "math/rand"
    "time"
)

// Backoff is a time.Duration counter, starting at Min. After every call to
// the Duration method the current timing is multiplied by Factor, but it
// never exceeds Max.
//
// Backoff is not generally concurrent-safe, but the ForAttempt method can
// be used concurrently.
type Backoff struct {
    //Factor is the multiplying factor for each increment step
    attempt, Factor float64
    //Jitter eases contention by randomizing backoff steps
    Jitter bool
    //Min and Max are the minimum and maximum values of the counter
    Min, Max time.Duration
}

// Duration returns the duration for the current attempt before incrementing
// the attempt counter. See ForAttempt.
func (b *Backoff) Duration() time.Duration {
    d := b.ForAttempt(b.attempt)
    b.attempt++
    return d
}

const maxInt64 = float64(math.MaxInt64 - 512)

// ForAttempt returns the duration for a specific attempt. This is useful if
// you have a large number of independent Backoffs, but don't want use
// unnecessary memory storing the Backoff parameters per Backoff. The first
// attempt should be 0.
//
// ForAttempt is concurrent-safe.
func (b *Backoff) ForAttempt(attempt float64) time.Duration {
    // Zero-values are nonsensical, so we use
    // them to apply defaults
    min := b.Min
    if min <= 0 {
        min = 100 * time.Millisecond
    }
    max := b.Max
    if max <= 0 {
        max = 10 * time.Second
    }
    if min >= max {
        // short-circuit
        return max
    }
    factor := b.Factor
    if factor <= 0 {
        factor = 2
    }
    //calculate this duration
    minf := float64(min)
    durf := minf * math.Pow(factor, attempt)
    if b.Jitter {
        durf = rand.Float64()*(durf-minf) + minf
    }
    //ensure float64 wont overflow int64
    if durf > maxInt64 {
        return max
    }
    dur := time.Duration(durf)
    //keep within bounds
    if dur < min {
        return min
    }
    if dur > max {
        return max
    }
    return dur
}

// Reset restarts the current attempt counter at zero.
func (b *Backoff) Reset() {
    b.attempt = 0
}

// Attempt returns the current attempt counter value.
func (b *Backoff) Attempt() float64 {
    return b.attempt
}

// Copy returns a backoff with equals constraints as the original
func (b *Backoff) Copy() *Backoff {
    return &Backoff{
        Factor: b.Factor,
        Jitter: b.Jitter,
        Min:    b.Min,
        Max:    b.Max,
    }
}

使用例

import "math/rand"

b := &backoff.Backoff{
    Jitter: true,
}

rand.Seed(42)

fmt.Printf("%s\n", b.Duration())
fmt.Printf("%s\n", b.Duration())
fmt.Printf("%s\n", b.Duration())

fmt.Printf("Reset!\n")
b.Reset()

fmt.Printf("%s\n", b.Duration())
fmt.Printf("%s\n", b.Duration())
fmt.Printf("%s\n", b.Duration())

100ms
106.600049ms
281.228155ms
Reset!
100ms
104.381845ms
214.957989ms

References

Exponential Backoff And Jitter 日志采集高可用之重试

backoff 的常见算法​

fixed backoff​

random backoff​

fibonacci backoff​

exponential backoff​

exponential jitter backoff​

代码实现​

使用例​

References​

backoff 的常见算法

fixed backoff

random backoff

fibonacci backoff

exponential backoff

exponential jitter backoff

代码实现

使用例

References